Classifying with confidence from incomplete information

نویسندگان

  • Nathan Parrish
  • Hyrum S. Anderson
  • Maya R. Gupta
  • Dun-Yu Hsiao
چکیده

We consider the problem of classifying a test sample given incomplete information. This problem arises naturally when data about a test sample is collected over time, or when costs must be incurred to compute the classification features. For example, in a distributed sensor network only a fraction of the sensors may have reported measurements at a certain time, and additional time, power, and bandwidth is needed to collect the complete data to classify. A practical goal is to assign a class label as soon as enough data is available to make a good decision. We formalize this goal through the notion of reliability—the probability that a label assigned given incomplete data would be the same as the label assigned given the complete data, and we propose a method to classify incomplete data only if some reliability threshold is met. Our approach models the complete data as a random variable whose distribution is dependent on the current incomplete data and the (complete) training data. The method differs from standard imputation strategies in that our focus is on determining the reliability of the classification decision, rather than just the class label. We show that the method provides useful reliability estimates of the correctness of the imputed class labels on a set of experiments on time-series data sets, where the goal is to classify the time-series as early as possible while still guaranteeing that the reliability threshold is met.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal (R, Q) policy and pricing for two-echelon supply chain with lead time and retailer’s service-level incomplete information

Many studies focus on inventory systems to analyze different real-world situations. This paper considers a two-echelon supply chain that includes one warehouse and one retailer with stochastic demand and an up-to-level policy. The retailer’s lead time includes the transportation time from the warehouse to the retailer that is unknown to the retailer. On the other hand, the warehouse is unaware ...

متن کامل

A Target Classification Decision Aid

A submarine's sonar team is responsible for detecting, localising and classifying targets using information provided by the platform's sensor suite. The information used to make these assessments is typically uncertain and/or incomplete and is likely to require a measure of confidence in its reliability. Moreover, improvements in sensor and communication technology are resulting in increased am...

متن کامل

On Incomplete XML Documents with Integrity Constraints

We consider incomplete specifications of XML documents in the presence of schema information and integrity constraints. We show that integrity constraints such as keys and foreign keys affect consistency of such specifications. We prove that the consistency problem for incomplete specifications with keys and foreign keys can always be solved in NP. We then show a dichotomy result, classifying t...

متن کامل

Taxonomy of Global Air Transport

Data from the United Nations and the International Civil Aviation Organization Information Systems were used as a base for characterizing, classifying and comparing air transport demand and supply features of 156 countries. Relevant data from 1980 were chosen to reflect five sets of characteristics namely, air transport, 50cm-economic status, population demography, geographical and environmenta...

متن کامل

Fuzzy multi-criteria decision making method based on fuzzy structured element with incomplete weight information

The fuzzy structured element (FSE) theory is a very useful toolfor dealing with fuzzy multi-criteria decision making (MCDM)problems by transforming the criterion value vectors of eachalternative into the corresponding criterion function vectors. Inthis paper, some concepts related to function vectors are firstdefined, such as the inner product of two function vectors, thecosine of the included ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2013